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THE  PILE  SYSTEM 


t.  THE  RLE  SYSTEM 

A flk  dcaolc*  a dau  collection  without  retard  to  phyiical  liic  or  to  formaL  It  further  denotea  the  phytical 
featurea  aaaocUted  with  the  Tde.  c-t-.  the  atoraie  cabinet,  the  mkronche  (He  box.  the  magnetic  tape  hardware,  the 
maia  atote  unit  Abo  implied  by  the  term  file  U the  human  organization  which  manipulatea  the  ^ytical  featurea 
containing  the  data  collection;  hence,  the  uter'a  lecreury.  hb  daU  proccasing  department  and  poxaiMy  the  user 

could  all  be  equally  a part  of  the  file.  What  we  now  have  b a file  ayatem  conabting  of  three  major  elemenb: 
data,  equipment,  and  humana. 

When  characterizing  a file  ax  large,  we  often  imply  that  the  file  lenda  to  be  unmanageable  and  ineffidently 
manipulated  by  currently  Implemented  techniquea.  A paper  file  conuined  in  a bulky  paper  folder  b difficult  to 
manipulate:  however,  the  tame  file  when  reduced  to  microfilm  becomes  easier  to  manipulate.  File  lizc,  ^th  regard 
to  information  content,  does  not  change  when  converted  to  microfilm,  but  acceaa  b facilitated.  Directoriet,  inventoriet 
and  catalop  are  typical  examplca  of  filea  which  may  be  "la.'ge"  in  paper  format  but  “ttnall**  in  microfilm  format. 
Sbnilatly,  a simple  relocation  of  a “large"  paper  file  from  a collection  of  drawe^type  filing  cabinets  to  an  automated, 
or  even  an  optimized  file  storage  device  significantly  reduces  the  apparent  file  size. 

Another  implication  of  large,  when  characterizing  a file,  b that  the  file  tends  to  fill  to  capacity  available  ttorage. 
File  size  b measured  in  storage  units:  hence  a file  becomes  large  when  the  number  of  storage  units  required  by  the 
file  b a significant  percentage  of  available  storage  units.  Expansion  of  storage  capacity  may  alleviate  the  problem,  but 
as  the  number  of  individual  storage  devices  increases,  the  file  again  becomes  utunanageable,  hence  "targe”, 

A third  implication  of  the  term  large  b that  it  b used  to  classify  those  files  which  have  or  are  envisioned  to 
have  the  greatest  dau  content  of  any  other  known  file:  Le.,  they  are  physically  immense.  Obviously  thb  class  of  file 
will  both  fin  to  eapacily  available  storage  devices  and  tend  to  be  unmanageable.  The  methodology  applied  to  thb 
dau  of  file,  however,  b no  different  than  that  which  should  be  applied  to  any  other  file. 

Dynamic,  abo  a relative  term,  implies  that  a sufficient  number  of  accesses  ate  being  made  to  the  file  system  to 
overwhelm  some  aspect  of  the  system's  implemenUtion.  Thb  results  in  reduced  response  time,  hence,  inconvenience 
to  the  user.  To  illustrate,  a small  cabinet  paper  file  b not  necessarily  dynamic  until  individuab  who  access  the  file 
either  must  wait  for  access  or  find  that  the  information  b being  accessed  by  others  and  b unavailable.  A file  may 
be  dynamic  not  only  on  access  but  abo  in  any  other  function  associated  with  the  file  system.  The  physically  immense 
files  coming  into  exbtence  arc  dynamic  because  of  not  only  their  real-time  access  but  abo  their  real-time  update 
rcquiiements  which  can  overwhelm  all  aspects  of  the  file  system. 

Figure  1 summarizes  the  definition  of  a "large  dynamic  file”.  The  methodology  associated  with  thb  type  file 
can  most  generally  be  defined  as  the  application  of  optimized  devices,  both  firmware  and  software,  which  faciliute 
file  manipulation  and  maximize  file  response  time.  It  b basically  no  different  than  the  methodology  applied  to  any 
other  type  of  file  system. 


LARGE 

- Volume  exceeds  handling  capacity  producing  unmanageability 
and  manipulation  inefficiency. 

DYNAMIC 

- Access  requirements  overwhelm  implementation  capability 
reducing  response  time  and  inconveniencing  users. 

\ 

FILE 
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- An  information  source  consbting  of  a system  of  data  entities, 
hardware  devices,  and  human  beings. 

\ 

Fig.1  Definition  Summary 
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Tte  faOowjag  paninphs  of  the  nport  cxploic  the  flk  syticm  and  define  iU  varioui  functiom  and  phytical 
itatca.  The  icport'f  objccthc  it  to  ptcteni  at  a nontechnical  level  an  imifht  into  what  comtitutei  a Tile  tyttem, 
how  it  evoivct.  and  whcic  special  emphasis  needs  to  be  placed  in  its  design  and  implementation.  Although  generalued 
file  systems  aw  considewd  whew  possible,  digitally  oriented  files  aw  pwrewntially  covewd.  Emphasis  it  alto  placed 
on  the  physically  immense  files. 


I.l  File  Fimetion 


A file,  wgardless  of  siie,  has  functions  which  can  be  categoriaed  as  Data  Collection,  Data  Conveision,  Data 
Storage,  and  Data  Retrieval.  The  degwe  of  sophistication  associated  with  each  function  is  detennined  by  the 
compicaity  of  the  data  processed  and  by  the  infonnation  dissemination  requiwments.  Figuw  2 depicts  typical  data 
flow  within  a file  system. 


To  be  useful,  data  mutt  be  extracted  from  the  file  and  processed  into  information;  the  fact  that  a file  stores 
data  but  a user  wants  information  it  important  to  recognize  and  understand.  Information  is  derived  only  when  data 
is  interpwted.  Data  intcrpwtation  to  ewate  information  is  a function  normally  performed  by  the  data  wtrieval  file 
function. 


If  data  it  to  be  avaiIzMe  for  wtrieval,  the  file  mutt  have  a collection  function.  Appropriate  collection  practices 
facilitate  all  subsequent  file  functions  and  the  timely  collection  of  data  such  that  it  it  avail^k  when  needed  is  of 
prime  importance.  No  specific  data  collection  proceduw  it  applicable  to  all  data  sources,  hence,  a wide  variety  of 
dau  collection  hardwaw  and  toftwaw  devices  aw  available  throughout  the  industry.  In  addition,  it  is  not  unusual 
to  find  specialized  data  colleclioo  procedures  specifically  tailored  to  unique  applications. 


The  data  coUection  function  docs  not  necessarily  produce  data  which  is  compatible  with  the  file  storage  apparatus. 
A conversion  function  is  thewfow  provided  to  process  collected  data  and  to  convert  it  into  a format  suitable  for 
storage. 


1.2  Data  Formatting 


To  facilitate  wtrieval,  data  must  be  segwgated  into  distinct  entities  which  can  be  cataloged  and  wfetenccd.  A 
significant  characteristic  of  a file  data  entity  is  that  it  b the  smallest  segment  of  data  which  can  be  accessed  by  the 
file's  wtrieval  fuiKtion. 


A data  entity  within  a file  system  denotes  an  ensemble  of  raw  data  elements.  A raw  data  clement  b the  funds* 
mental  data  constituent  and  can  exbt  ut  one  of  two  basic  forms:  fact  or  thou^t.  A factual  dau  element  b specific; 
numbers  wpresenting  size,  weight,  temperatuw  or  chancterbtics  such  as  gender,  color  or  location  aw  examples.  The 
factual  daU  element  can  be  stowd  within  the  file  by  a fixed  number  of  file  storage  units  which  aw  known  a priori. 
The  thought-type  data  element  b abstract;  alphanumeric  text  strings  wpwsenting  letters  or  documents  aw  examples 
of  thb  type  of  daU  element.  The  number  of  file  storage  uniU  required  to  accommodate  a thought-type  daU  element 
cannot  be  exactly  specified. 


Basic  data  elements  aw  collected  together  to  form  a data  entity;  however,  thb  does  not  pwclude  the  possibility 
that  a file’s  daU  entity  can  itself  be  a single  daU  clement.  Data  entities  can  be  composed  of  both  element  forms; 
hence,  the  daU  entity  can  be  either  fixed  size  or  variable  size  as  measuwd  in  file  storage  units  requited  to  contain 
the  elements.  Fixed  or  variable  aw  the  mow  common  terms  used  to  characterize  a daU  entity  and  often  the  term 
"dau  record”  b used  synonymously  for  the  term  "daU  entity",  especially  in  electronic  daU  processing  environments. 


When  accessed  and  wtrieved  from  file  storage,  elements  can  be  processed  along  with  other  elements  from  other 
entities.  This  constitutes  the  process  of  generating  information.  Data  in  its  fundamental  form  or  even  when  collected 
into  entities  b not  necessarily  information.  Information  implies  that  knowledge  is  imparted  to  an  individual  and  thb  it 
a critically  important  consideration  when  defining  the  architecture  of  a file  system.  It  is  a simple  task  to  configuw  a file 
capable  of  inundating  a user  with  data  from  which  little  or  no  information  can  be  derived.  It  is  a nontrivial  design 
problem  to  ewate  a file  system  which  can  supply  information  to  a user  at  the  proper  time  and  place  needed. 


Unfortunately,  many  files  and  especially  physically  immense  files,  have  evolved  without  proper  design  guidelines 
and  do  not  yield  infomution  at  the  rate  or  in  the  quantity  needed.  Minor  file  system  woesigns  can  alleviate  some 
problems,  but  unless  the  total  file  system  is  considewd  and  a total  system  approach  established,  the  attained  per- 
formance b substantially  less  than  desiwd. 


1.2  The  File  Cycle 


When  captured  by  the  file  system's  collection  function,  a data  clement  is  launched  on  an  evolution  process  which 
ends  in  purging.  Throughout  the  cycle  a data  element  will  exist  in  various  physical  stales  as  shown  in  the  file  cycle 
stale  dbgnm.  Figuw  3.  A daU  ebment  cannot  simultaneously  exbt  within  the  file  system  in  two  slates;  however, 
the  storage  state  b an  exception  to  thb  rule  lince  access  to  a data  element  often  does  not  physically  remove  the  data 
element  from  the  storage  device. 
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' Connecton  between  ttate*  lepfeicnt  all  ponibte  Irantilion  channeb  wfcelher  automatetf  or  mamiaOy  aceomplblied.  | 

^ A lime  interval  b associated  with  a ttate  chanfc;  for  example,  the  access  time  to  a memoiy  leprcsentf  (or  b a of)  | 

' the  transitiofl  time  from  a storape  stale  to  the  access  stale. 

< i 

The  primary  purpose  of  the  file  system’s  data  collection  function  b to  capture  raw  data  ebments.  The  collection 
’ process,  therefore,  must  be  capabb  of  manipubiing  all  data  ebments  whbh  are  to  be  entered  in  the  file  system. 

Based  on  the  dau  element  form,  many  unique  devices  and  activities  may  be  involved  simultaneously  in  the  data  * 

collection  function.  Temporary  storage  may  also  exbt  within  Ihb  function. 

Processing  converts  the  raw  data  element  into  a format  suitable  for  storage  and  assembles  the  ebments  Into 
entities.  It  b not  unreasonable  to  employ  different  storage  devices,  hence  to  require  a variety  of  processing  activities 
within  the  fib  system.  It  is  also  not  unusual  and  often  it  b most  effiebnt  to  combine  data  ebment  conversion  as  ’ 

an  integral  part  of  the  coibclion  function.  ’ 

Most  data  element  transitions  involve  a storage  slate  which  b thb  most  common  fib  stale  and  whbh  b often 
I the  limiting  feature  svith  regard  to  fib  capacity  and  response  efficiency.  Possibly  because  of  thb,  industry  hat  for 

several  decades  been  pressing  to  develop  larger  capacity  and  tester  access  memories,  not  only  for  digital  format  data.  « 

but  for  paper  and  film  formats  as  weU. 

Reirbval  of  date  ebments  and  their  conversion  to  information  b the  key  file  function.  If  it  b not  adequately 
addressed,  the  file  system  closely  approximates  the  proverbial  (and  useless)  write^nly  memoiy.  The  output  of  the 
retrieval  function  b information  and  may  consut  of  reports,  tablet,  graphs  or  other  forms  of  assimilated  data  ebments. 

It  b also  not  unusual  tor  the  output  of  the  retrieval  function  to  cause  stoied  data  elements  to  be  updated  or  to  I 

generate  new  data  elements  which  must  be  entered  into  the  file  system.  These  characteristbs  of  the  retrieval  function 

represent  a deta  ebment  feedback  path  whbh  must  be  as  carefully  considered  in  the  configuration  of  a file  system 

as  an  ebcliunb  feedback  path  b considered  in  the  design  of.an  electrical  system.  ^ ^ 

The  purge  state  of  the  retrieval  function  b another  important  state  and  represents  the  complete  elimination  of 
a data  element  from  the  system.  In  some  situations,  the  purged  data  ebment  may  be  recaptured,  but  in  most  situa- 
tions the  ebment  b irrctibvably  lost.  The  purge  state  b often  considered  to  be  synonymous  with  the  archival  storage 
slate.  Thb  b incorrect  and  if  the  archival  store  state  is  used  to  retain  ebments  which  should  be  purged,  the  file 
system  becomes  clogged  and  inefficient.  Obviously,  when  a data  ebment  reaches  the  purge  ttate,  its  file  cycle  has 
been  compbted. 

1.4  System  Objectives 

A fib’s  mere  existence  does  not  necessarily  imply  that  the  fib  is  accomplishing  much  more  than  data  storage. 

To  be  truly  useful,  a fib  must  furnish  information  which  can  impart  knowledge.  It  b therefore  extremely  important 
to  establish  wlut  information  b to  be  derived  from  the  file;  thb  b equivalent  to  defining  what  specific  data  elements 
must  be  made  available,  how  they  are  to  be  cataloged  and  referenced,  and  how  they  are  to  be  processed. 

A fib  which  presents  only  those  data  ebments  necessary  to  extract  a specific  item  of  information  b an  exception 
rather  than  a rule  among  commonly  encountered  file  systems.  The  common  file  system  procedure  b to  overwhelm  a 
user  with  data  ebments  in  hopes  that  he  will  be  able  to  tort  through  the  elements  and  generate  the  desired  informa- 
tion. Thb  has  long  been  a major  problem  associated  with  Management  Information  Systems  (MIS).  These  electronb 
data  processing  software  systems  were  created  to  supply  relevant  information  in  a timely  fashion  to  senior  corporate 
management  such  that  they  could  more  effectively  make  management  deebions.  They  probably  represent  one  of 
the  initial  forms  of  data  management  systems  and  a great  deal  has  been  teamed  about  file  systems  methodology 
during  the  evolution  of  MIS  into  a viable  and  useful  tool.  ^ 

Definition  of  fib  system  objectives  is  not  an  easy  task,  but  since  the  file  is  to  be  implemented  with  physical 
devices  which  only  do  specific  functions,  it  b an  absolutely  necessary  task.  As  with  any  system  design,  the  first 
questions  to  ask  are  who  will  use  the  file,  what  information  will  he  i 'sk  and  what  data  ebments  are  needed  to 
support  the  information  generation  process.  Having  formulated  answers  to  these  three  basic  questions,  the  file  system 
architect  can  optimumly  configure  a design  which  addresses  specified  objectives. 

Of  additional  importance,  the  architect  must  configure  the  fib  system  to  meet  the  changing  needs  of  a dynamic 
environment  The  one  overriding  information  sciences  theorem  which  has  the  greatest  impact  on  file  system  design, 
b that  data  element  criticalness  and  importance  changes  with  time.  Corsequently,  file  system  objectives  established 
during  a definition  procedure  change  slightly  throughout  the  life  of  the  fib  system.  A well-defined  fib  system 
facilitates  the  capture  of  new  date  etemenU  classes,  the  debtkm  of  old  classes,  the  regrouping  of  ebmenU  within 
entities,  and  the  introduction  of  new  information  generation  processes. 
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DATA  COLLECTION 


2.  DATA  COLLECTION 

Oila  colleclion  it  in  integral  part  of  the  lUe  tyttem:  in  fact,  it  it  the  foundation  of  the  file  tyttem.  Appropriate 
collection  practicet  affect  all  tubtequent  file  functiont;  timely  and  efficient  data  collection  minimizet  information 
delay  to  the  uier  and  accurate  data  collection  optimizet  the  effectivenett. 

2.1  Data  Capture 

Senton  capture  raw  data  and  temporarily  ttore  it.  hopefully  in  a format  which  facilitatet  convertion.  The 
human  eye  it  one  of  the  mott  commonly  uied  data  tentort  in  a manual  data  collection  activity.  Temporary  ttorage 
for  data  collected  in  thit  fathion  typically  it  a paper  form  which  it  marked  in  tome  fashion  by  the  person  collecting 
the  data.  If  consideration  it  given  to  the  total  file  tyttem  concept,  a format  and  marking  procedure  can  be  implemented 
which  automates  the  coitversion  function:  hence,  data  transition  timet  between  file  cycle  states  are  reduced.  Naturally 
a machine-readable  format  facilitatet  processing,  however,  if  tubtequent  manual  sorting  or  review  it  a requirement  of 
the  collection  futKtion  a human-readable  format  mutt  also  be  maintained. 

Automated  tentort  and,  more  specifically,  remote  electronic  tensors  are  alto  involved  in  the  data  collection 
process.  In  these  situations  data  collection  can  easily  overwhelm  the  file  tyttem,  particularly  the  conversion  function, 
and  taw  data  mutt  be  stored  for  extended  periods  in  its  capture  format  (possibly  on  analog  magnetic  tape)  for  tub- 
tequent conversion.  Thit  procedure,  if  present,  it  usually  indicative  of  an  uncoordinated,  piece-witc  file  system 
implementation  which  did  not  fully  consider  all  aspects  of  the  tyttem  definition. 

2.2  bapkmealtllon  Concepts 

Implementation  of  equipment  and  procedures  to  capture  data  implies  that  the  type  of  information  which  the 
file  tyttem  it  to  provide  hat  been  defined  and  that  necessary  input  data  elements  have  been  identified.  Each  data 
element's  format  (i.e..  paper,  film,  magnetic  or  electrical)  and  its  size  characteristic  suggest  possible  equipment 
implementationt.  Data  element  priority  and  availability  plus  economic  facton  also  contribute  significantly  to  the 
physical  implementation  of  the  sensor  utilized  to  capture  the  dau*  element.  A further  consideration  is  the  output 
format  of  the  data  element  when  transferred  to  the  conversion  funt'ion  for  processing.  The  file  collection  function, 
therefore,  is  implemented  by  an  assortment  of  sensing  devices  whose  purpose  is  to  capture  data  elements  and  present 
them  to  the  conversion  function. 

To  illustrate  these  concepts  consider  the  capture  of  the  raw  data  element  temperature.  Numerous  sensor  types 
and  procedures  can  be  quickly  identified  and  range  from  an  individual  person  observing  the  reading  of  a thermometer 
to  a thermoelectric  device  pr^ucing  an  electrical  signal.  The  colleclion  function,  however,  is  not  complete  until 
the  raw  data  has  been  recorded  in  a format  which  is  transferrable  to  the  conversion  function.  Hence,  the  captured 
data  elements  must  be  transcribed  to  some  format  which  is  reaccessible.  Again  a broad  range  of  techniques  is  avail- 
able which  for  the  above  example  may  be  anything  from  a slip  of  paper  in  the  human  case  to  a strip  recorder  or 
magnetic  tape  for  the  thermoelectric  device.  Each  sensor  mechanism  and  formatting  technique  produces  different 
response  times,  accuracies  and  efficiencies;  the  selection  of  the  optimum  approach  can  only  effectively  be  guided 
by  the  file  system's  ultimate  objective. 

2.3  Data  Collection  Trends 

Trends  in  file  system  implementation  are  toward  automated  sensors  which  capture  data  elements  and  transfer 
them  directly  to  the  file  convenion  function.  For  example,  significant  technology  advances  are  being  made  in  point- 
of-sale  sensing  equipment  which  immediately  updates  inventories  and  performs  real-time  accounting  fuctions.  Thit 
technology  can  also  be  readily  applied  to  a wide  variety  of  data  collection  requirements. 

In  the  processing  of  alphanumeric  text,  optical  character  recognition  (OCR)  technology  is  available.  (XR 
equipment,  however,  is  still  limited  in  capability  because  only  a limited  number  of  different  fonts  can  be  recognized 
by  any  one  equipment  and  a human  operator  is  needed  to  aid  the  equipment  in  defining  unrecognized  characteiv 
For  those  organizations  which  have  control  over  text  generation,  these  limitations  are  minimal  and  data  collection 
is  greatly  facilitated  by  the  application  of  OCR. 
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htelioMl  and  international  earth  Mtcllite  data  collectioa  tysteint  are  other  eaample*  of  very  kcphbticatcd  aenwr 
tytlemt.  Also  noted  in  current  trends  is  the  fact  that  wUcction  and  conversion  functions  arc  becomini'  less  distincL 
It  b quite  common  to  observe  sensing  subsystems  which  itot  only  capture  data  elements  but  also  immediately  process 
these  data  elements  for  direct  transmission  to  Tile  storage.  HopefuL'y  thb  will  be  an  increasing  trend  in  earth  satellite 
systems  which  need  some  form  of  data  element  selection  | locess  to  reduce  the  tremendous  data  volume  presented  to 
processing  centers. 
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3.  DATA  CONVERSION 

The  file's  convenion  function  assembles  data  elements  into  entities  which  can  be  conveniently  stoKd  and 
accessed.  Often  the  convenion  function  transforms  data  from  one  format  to  another,  physically,  mathematically,  or 
by  means  of  some  other  proceduK  which  fulfills  file  objectives  and  implementation  constraints. 

3.1  Entity  Formatting 

llie  grouping  of  data  elements  into  entities  b determined  by  a number  of  facton  including  logical  relationships 
between  elements,  element  utiliaation  in  information  generation  procedures,  and  physical  attributes  associated  with 
the  file  system  equipment.  Logical  relationships  are  definitely  the  major  factors  affecting  the  arrangement  of  elements 
into  entities;  however,  file  system  efficiency  tradeoffs  exist  with  respect  to  entity  size  selection  and  file  system  data 
maisipulation  attributes. 

Most  file  systems  arc  record  or  volume  oriented  so  that  fixed  physical  quantities  of  stored  units  are  manipulated. 
For  example,  the  microfiche  file  manipulates  a fixed  film  chip  size  even  though  all  storage  units  (an  image  cell)  on 
the  microfiche  are  itot  utilized.  Digital  files  similarly  manipulate  fixed  size  physical  records  of  dq  al  data  which  are 
subdivided  into  binary  words;  within  a record  all  words  are  not  necessarily  utilized.  The  conversion  process,  there- 
fore, must  overlay  the  physical  attributes  of  the  file,  especially  those  of  the  storage  function,  with  a logically  assembled 
data  entity  whose  size  characterbtics  often  does  not  match  the  file's  physical  data  manipulation  attributes. 

Incompatibility  of  logical  entity  size  and  physical  storage  record  size  is  especially  common  in  electronic  data 
processing  file  systems  Often  the  matching  of  logical  entity  sizes  to  physical  storage  characteristics  b not  addressed. 
siiKc  data  management  software  lends  to  mask  physicai  attributes  of  the  storage  medis  The  end  result  of  this 
fallacy,  however,  is  similar  to  that  obtained  by  storing  legal  size  documents  in  a standard  sized  filing  cabinet;  sub- 
stanti^  storage  space  b wasted.  Also  it  is  often  not  even  possible  to  achieve  an  optimized  matet  since  a specific  file 
entity  format  must  be  overlayed  on  exbting  equipment  facilities.  However,  when  both  entity  format  definition  and 
equipment  data  manipulation  attributes  can  be  traded  off  to  yield  an  optimized  match,  significant  improvements  can 
be  achieved  in  file  system  data  handling  efficiency. 

Data  conversion,  especially  with  regard  to  entity  formatting,  b closely  tied  to  the  data  management  protocol 
estabibhed  by  the  retrieval  function.  AUhougfi  the  actual  process  of  formatting  data  b accomplished  by  the  conver- 
sion function,  the  formatting  procedure  b established  by  data  retrieval  techniques.  Hopefully  a data  management 
procedure,  as  dbetissed  in  Section  V,  will  provide  an  entity  format  scheme  which  is  independent  of  any  one  specific 
information  generation  procedure.  Such  a format  scheme  eliminates  element  storage  redundancy  since  unique  entity 
formats  are  not  required  for  each  information  generation  procedure. 

3.2  FleinenI  Ttanafontution 

Transfomution  of  data  elements  from  one  form  to  another  can  be  as  simple  as  a conversion  between  units  of 
measure  to  as  complex  a procedure  as  redundancy  removal  procedures  associated  with  imagery  data.  Several  bask 
factors  motivate  element  transformation.  First,  transformation  can  be  required  because  the  information  generation 
process  b optimized  by  using  the  transformed  elements.  Secondly,  data  in  its  transformed  slate  may  require  less  file 
system  resources  to  process  and  maintain;  normally  an  inverse  transformation  is  requited  for  dib  type  data  element 
prior  to  its  utilization  in  the  information  generation  process. 
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4.  DATA  STORAGE 

ThRc  fuiKtimental  categories  or  storage  sutes  can  be  overlayed  on  the  daU  storage  function  and  typicaUy  a (tte 
system  divides  its  storage  facilities  into  Mgments  associated  with  each  category. 

Dynamic  store  is  the  primary  data  store  category  and  represents  that  facility  from  which  the  retrieval  function 
most  frequently  extracts  Tile  dau  elements.  The  executive's  personal  desk  file  and  a computer's  core/disk  memory 
arc  examples  of  dynamic  store.  Characteristics  of  dynamic  store  include  fast  and  preferably  random  access  to  data 
entities  contained  within  the  storage  mechanism. 

The  second  category  of  storage  is  iructive  storage  and  is  characterized  by  easy  accessibility,  but  access  time  is 
significantly  slower  than  for  dynamic  store.  Parallel  examples  are  the  executive  secretary's  file  cabinet  and  the 
computer  facility's  magnetic  tape  library. 

Archival  store  is  the  third  storage  category,  and  warehousing  of  non-current  records  or  of  magnetic  tapes  are 
illustrative  examples.  Archival  store  is  characterized  by  extremely  slow  and  often  inconvenient  access  procedures 
but  should  not  be  confused  with  the  purge  state  of  the  retrieval  function. 

Although  the  fundamental  file  storage  categories  are  applicable  to  all  classes  and  types  of  file  systems,  the 
paragraphs  of  this  section  address  only  those  file  systems  which  are  digitally  oriented. 

4.1  Digital  Storage  Alternative 

A range  of  storage  devices  h again  av4itable  to  address  the  digital  data  dynamic  store  configuration.  Semi- 
conductor, core,  and  magnetic  media  are  most  prevalent  and  currently  serve  as  industry  standards.  A recent  memory 
device  development,  the  charge  coupled  device  (CCD),  is  adding  an  additional  memory  storage  capacity  which  falls 
between  core  and  random-access  magnetic  media  in  both  capacity  and  access  time.  CCD  memory  cost  is  similarly  less 
than  magnetic  media  costs,  but  greater  than  random-access  memory.  Other  relatively  recent  memory  implementation 
additions  are  the  floppy  disk  and  magnetic  tape  cassette  which  represent  a different  way  to  package  and  utilize 
magnetic  media.  Naturally  each  configuration  addresses  some  applications  very  effectively,  but  an  improperly  applied 
memory  device  can  significantly  reduce  overall  system  efficiency. 

The  impetus  of  digital  storage  development  has  been  the  demand  to  make  storage  capacity  bigger,  access  faster, 
and  cost-per  bit  cheaper.  For  many  years,  numerous  technological  areas  have  been  actively  investigated  in  hopes  of 
developing  new  devices  and  techniques  which  would  achieve  these  objectives. 

One  major  area  of  development  activity  has  addressed  electron  beam  memories  (EBM)  and  has  fluctuated  between 
periods  of  enthusiasm  and  despair.  The  small  size  of  an  electron  beam  provides  a potential  for  high  digital  data 
packing  density;  however,  materials  technology  does  not  easily  yield  devices  which  compete  with  magnetic  technology. 
However,  recent  refinements  in  EBM  storage  media  have  significantly  improved  leakage  characteristics  which  affect 
long  »eim  data  storage  and  improved  high  voltage  powe.'  supply  components  have  allowed  electron  beam  addressing 
problems  to  be  overcome.  As  a result,  as  EBM  v/ith  10  to  30  megabits  of  storage  capacity  is  projected  to  be  com- 
nierciaUy  available  this  year  (1976).  The  unit  is  anticipated  to  be  a disk  replacement  and  will  have  storage  costs 
comparable  to  magnetic  disks  but  with  improved  performance  characteristics,  primarily  reduced  access  tiine  to  stored 
data.  Hopes  of  using  EBM  technology  in  large  scale  memories  having  (I0'>)  to  (10'*)  biu  of  storage  capacity  still 
exist  in  certain  segments  of  industry,  but  near  term  utilization  within  this  decade  teems  remote. 

Another  technology  being  actively  pursued  is  the  magnetic  bubble  memory.  This  technology  offers  significant 
potential  b’lt  it  best  summarized  as  being  still  in  the  research  stage.  No  applications  oriented  device  is  anticipated 
this  decade. 

Optical  memory  technology  Is  a third  highly  active  area  which  offers  substantial  potential  but  which  has  been 
handicapped  by  materials  technology.  During  the  past  several  years  a considerable  effort  has  been  enjertaken  by 
industry  to  apply  optical  techniques  to  a broad  spectrum  of  memory  and  storage  applications.  The  research  is 
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directed  towud  both  inteimcdulfcapacity.  ImI  acem  time  mad/write  memory  applicatiom  at  wed  at  the  laipe-  < 

capacity,  kmiier  acceu  lime  read-only  tlorafr  devicct.  Read/write  memotiet  have  not  yet  emeiied  from  the  retcar<^ 

laboratory,  since  efforts  have  been  hampered  by  the  uiuvailabiliiy  of  suitable  materials  which  ate  needed  to  connguic 

several  key  memory  components.  | 

The  prospects  for  read-only  optical  memories  ate  sifnificanlly  better  because  problems  associated  with  hiih-tpecd  t 

input  data  formatters  and  reusable  storape  media  art  obviated.  Optiwal  memories  using  metallized  miliar  strips  have  ‘ 

bwn  commercially  available  from  Precision  Instrufflcntt  Company  and  read-only  optical  memories  using  standard 
photographic  film  arc  currently  in  various  stages  of  development  by  other  companies.  i 

*t 

4.2  Storage  Oiaractcristics 

« 

The  two  most  widely  used  performance  measures  for  memories  arc  capacity  and  access  time.  Clearly  there  are 
many  other  factors  such  as  transfer  rate,  size,  power  consumption,  interface  ease,  reliability  and  reproducibility  which 
rtuy  play  equally  important  roles  in  characterizing  memory  performance. 

Figure  4 shows  conventional  memory  technology  state  of  the  art  in  terms  of  capacity  and  access  time. 

Technology  ranges  from  the  relatively  small  but  fast  semiconductor  memories  throuifi  moving  head  disk  memories 
to  the  larger  and  slower  bulk  storage  devices  such  as  magnetic  tape,  f^lcarly  most  memory  and  storage  technology  is 
lenfined  to  magnetic  phenomena.  The  exceptions  to  the  magnetic  dominattce  ate  at  the  low-capacity,  fast  access 
end  with  semiconductor  technology  and  at  the  large-capacity,  slow  recess  end  with  the  IBM  38S0,  the  Ampex 
Terabit  Memory,  the  Precision  Instrument  Model  190  bit-by-bit  optical  memory  and  the  Control  Data  Corporation 
3SS00  mass  storage  systems. 

The  trends  in  memory  and  storage  technology  indicate  a gradual  (although  sometimes  rapid)  shift  toward  larger 
and  faster  devices.  Memory  trends  also  indicate  a decrease  in  cost  per  bit  of  stceage.  The  sipificant  reduction  in 
semiconductor  memory  cost  is  a most  dramatic  illustration  of  this  charscterbtic. 

When  determining  memory  cost,  both  the  total  system  cost  and  the  storage  media  cost  must  be  considered. 

Conventions!  computer  magnetic  storage  system  costs  range  from  2(10*’)  to  6(10*’)  cents/bit  and  storage  media  costs 
are  approximately  l(T’  cents/biL  If  one  considers  only  on-line  storage,  an  increase  in  capacity  means  that  additional 
hardware  must  be  added:  therefore,  the  cost  per  bit  icmaim  relatively  constant  as  capacity  increases. 

Mass  digital  storage  systems  arc  currently  available  from  industry  and  provide  storage  capacities  up  to  10”  bits 
with  some  companies  claiming  10”  system  capacities.  Costs  for  these  systems  average  around  2(I(T*)  cents/bit  and, 
as  svould  be  expected  in  a competitive  envirorunent,  costs  are  relatively  consistent  between  manufacturers. 

4 J Storage  Hierarchy 

A multiplidly  of  data  storage  devices  is  utilized  to  solve  the  file  system’s  storage  requirements.  As  a result, 
data  storage  is  dbtributed  over  a network  of  storage  devices  whose  characteristics  range  from  high  speed  and  high 
cost-perbit  to  low  speed  and  low  cost-per-biL  The  network  of  storage  devices  it  configured  into  a hierarchy  which 
utilized  high  speed  memory  to  store  those  data  elements  currently  being  processed  and  low  speed  memory  for  data 
entity  storage. 

At  the  file's  memory  capacity  is  expanded,  devices  ate  employed  which  physically  must  access  increasingly 
larger  blocks  of  data  for  each  storage  or  retrieval  request;  hence,  data  entities  must  be  further  assembled  into  larger 
ensembles.  This.pix.cess  of  coiKatenating  data  ensembles  into  larger  data  segments  also  forms  a logical  data  hierarchy 
which  must  overlay  the  storage  hierarchy. 

One  extreme  of  the  storage  device  hierarchy  is  characterized  by  (he  computer  processor's  data  register  which 
provides  high  speed  data  element  access.  The  hierarchy  progresses  up  through  random-access  core  or  semiconductor 
memory,  to  random-access  disk  or  drum  storage  and  finally  to  large  scale  mass  storage  devices  which  are  typics  ily 
magnetic  tape  oriented  with  storage  capacities  in  the  10”  to  10”  range.  All  storage  devices  within  the  hieratxT.y 
are  normally  read/write  devices  employing  magnetic  technology  to  accomplish  data  storage. 

i 

A recent  addition  to  the  storage  hierarchy  has  been  the  archival  mass  memory.  Since  magnetic  media  suffers 
degradation  during  readout  and  over  extended  storage,  the  need  for  archival  data  storage  it  gaining  urgency.  To 
date,  this  need  has  been  met  by  optical  memories  using  bit-by-bit  recording  techniques.  Holographic  techniques, 
however,  offer  more  cost-effective  apprtraches.  Holographic  storage  devices  are  currently  operational  in  prototype 
form  and  one  system  will  become  fully  operational  within  the  next  year.  This  system  and  other  holographic  mass 
storage  systems  under  development  by  Harris  Corporation  are  discussed  in  a subsequent  paragraph. 

A eommon  way  to  evaluate  storage  hierarchies  is  to  establish  a generalized  information  creation  task  which  is 
dien  executed.  SitiM  this  is  done  after  t)ie  file  system  has  been  designed  and  implemented  veiy  little  can  be  4 me 
to  optimize  storage  hierarchy.  The  literatuie  defines  numerous  analytical  hierarchal  memory  design  approaches  jvhkh 


CM  be  employed  by  the  file  tyitcm  architect  As  with  the  dcsipn  of  other  file  fimetiom,  the  degree  of  optimiutioa 
of  the  file  storage  function  is  directly  dependent  upon  the  cffectivenett  of  the  file  system’s  operations!  obfecthe 
deTmitioo. 


4.4  Rcad^My  Optical  Mcnioiies 

Kead-only  holographic  memories  typically  use  film  as  the  recording  media.  Once  exposed,  the  film  record  is 
removed  from  the  recorder,  developed  by  normal  techniques,  and  placed  in  a holding  area  until  data  retrieval  it 
required.  If  any  portion  of  the  recorded  data  must  be  changed  or  updated,  the  entire  record  rnust  be  re-recorded 
and  replaced  within  the  memory.  Read-only  memories,  therefore,  are  ideally  suited  for  archival  applicationt  svhere 
uptlating  is  relatively  infrequent. 


Recent  advances  in  materials  and  components  have  allowed  the  production  of  prototype  holographic  memories 
as  is  demonstrated  by  several  special  purpose  hardware  systems  being  developed  by  Harris  Corporatiui's  Electronic 
Systems  Division.  Synthetic  hotography  has  been  successfully  applied  to  the  storage  of  digital  data  in  the  Human 
lUad/Machine  Read  (HRMR)  System  developed  under  contract  with  the  Rome  Air  Development  Center.  A research 
prototype,  shown  in  Figure  S,  was  delivered  in  May  1973  atul  an  engineering  prototype  version  is  currently  in  the 
final  stages  of  fabrication. 


The  HRMR  System  addresses  the  document  storage,  retrieval  and  dissemination  problem  which  is  impacting 
both  government  and  industrial  complexes  having  large  document  data  bases.  The  HRMR  concept  is  based  upon 
annotating  a standard  microfiche  with  the  digital  equivalent  of  the  associated  images.  Optical  readout  of  the  digital 
data  directly  from  the  microfiche  facilitates  storage,  retrieval  and  dissemination  of  data  to  both  local  and  remote 
locationt. 


CONttOlUt  MODULE 
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cer  txsruY  unit 


MODEM  UNIT 


Fig.5  HRMR  Holographic  memory  research  equipment 


A direct  extension  of  the  concept  is  the  full  utiliution  of  the  microfiche  film  chip  for  digital  data  recording. 
Thirty  megabits  of  user  data  per  mkrofiche  film  chip  is  preKntly  being  realized  at  a packing  density  exceeding  orte 
megabit  per  square  inch.  Since  this  packing  density  is  significantly  below  theoretical  Kmilations,  considerable 
improvement  can  be  anticipated  as  components  and  techniques  are  refined.  Further,  the  synthetic  recording 
technique  hat  been  shown  to  be  compatible  with  the  developing  dry  silver  film  technology  which  elimiiMtet  wet 
film  procesaors. 


The  HRMR  engineering  prototype  system  collects  microfiche  film  chips  into  a mass,  data  store  which  has  a 
maximum  capacity  of  2(10")  bits.  A physical  data  block  size  on  a tingle  mkrofiche  chip  it  approximately  433 
kilobits  and  lepresents  the  snullett  addrcsuble  and  retrievabk  physical  data  segment  Because  of  the  chip  rather 
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thM  t»pt  laedii  fonMi.  d4Ma  acccaMlity  li  ctahMocd;  any  phyikai  data  Mock  wilfcia  Ike  lolai  tloio  caa  ba  ntifevad 
In  Im  than  «S  tcconda.  Tnie  aicfcivabiUty  of  Ike  itaadaid  lilver  halida  (Urn  locoidiag  nadia  and  nondcatnietiva 
tcadoui  aia  additional  adranUfca. 

WhOe  the  ina  of  tyntkclic  bolo(rapliy  for  ttor^  of  digital  daU  on  mlcroficiic  proridet  a whiUon  to  loma 
niaM  memoty  icqulremcnta,  diCrmnl  rtcocdini  lechiiiquct  and  phyiical  iccoid  foimau  are  more  niilable  to  otker 
typea  of  applicalione.  For  example,  the  tlorage  and  retrieval  of  digital  daU  in  very  large  daU  records  at  extremely 
faM  recording  and  readout  daU  rates  can  be  best  handled  luing  roll  (Urn  and  interferometric  holography. 

As  with  magnetic  tape  recorders,  the  large  supply  of  continuously  moving  recording  media  allows  very  lav 
(Le.,  lens  to  hundreds  of  megabils)  dau  records  to  be  recorded  and  played  back  with  little  Interface  buffering.  Roll 
film  formats  also  allow  sustained  date  ! eocessing  at  hundreds  of  megabits  per  second.  Thus,  holographic  recording 
on  roll  rUm  offets  an  extension  of  the  «rge  date  buffer  capabilities  now  offered  by  high  speed  insltumentetion-type 
mr«netic  tape  recorders.  Currently,  sift  je  transport  magnetic  tape  recorders  can  operate  at  recording  and  playback 
speeds  of  up  to  80  or  90  Mb/s  and  can  store  date  at  a linear  density  of  about  600  Kb/indi  on  onofneh  wide  te  te. 

In  comparison,  holographic  techniques  can  be  used  to  record  and  reproduce  digitai  date  on  tingle  transport  devices 
‘ at  several  hundred  megabits  per  second  at  linear  pKking  densities  that  are  at  least  six  times  greater  than  now  f Ttctl 
with  magnetic  tape. 

The  Wideband  Hotographic  Recorder  Explontoty  Development  Model,  also  under  development  by  the  Harris 
Corporation  Electronic  Systems  Division  under  contract  with  the  Rome  Air  Development  Center,  uses  roU  film 
format  and  an  interferometric  recording  approach.  An  operational  breadboard  has  demonstrateo  the  recording  and 
readout  of  date  at  rates  up  to  600  Mb/s.  Using  interferometric  Fourier  transform  holography,  about  3(10")  bits 
of  date  f " be  recorded  on  a SOOCVfoot  roll  of  35  mm  film.  At  recording  rates  of  up  to  500  Mb/s,  nonstop  recordtng 
could  be  sustained  for  approximately  9.5  minutes.  Readout  rates  reduced  by  10:1  or  100:1  from  record  rates  arc 
easEy  bnplcmenicd.  Although  recording  b readily  accomplished  using  roO  film  supplies,  once  recorded,  the  film  can 
be  segmented  and  cassette  mounted  when  faster,  more  random  access  date  or  distribution  of  duplicate  date  packs  is 
desired. 


\ 

"\  ,! 


14 


1 


ScctionV 


DATA  RETRIEVAL 


S.  DATA  RETRIEVAL  \ 

The  Mlkclion,  conversion  wid  ttorafc  of  dita  elements  collectively  nuke  up  srhat  is  commonly  referenced  as 
a “data  base“,  althwgh  the  term  is  somewhat  ambiguous  and  ilMefined.  If  one  were  to  ask  an  oiganization't  senior 
level  manager  what  comprised  his  data  base,  a typical  respotuc  would  suncst  that  only  machine^adable  digital  data 
was  included.  In  actuality,  an  organization*!  data  base  is  much  more  complex  and  includes  mktofllm  and  paper  flies, 
as  wen  as  those  “special”  desk  flies  of  individual  personnel  throughout  the  organization.  It  also  includes  the  hard* 
ware  facilities  and  people  used  to  store,  process  and  transfer  data  elements. 

Data  is  very  much  an  organization’s  resource  which  must  be  carefully  preserved  and  managed.  Like  cash 
resources,  data  is  useless  if  it  is  unavailable  when  needed.  Ail  too  oflen,  data  is  integrated  into  specialized  functions 
of  a given  depaitment  or  a given  individual.  When  a need  suddenly  arises  to  utilize  the  data  in  other  areas,  it  is 
often  frustrstingly  difficult  if  not  impossible  to  access  the  required  dau  elements. 

Data  base  management  is  integrally  associated  or  is  synonymous  srith  data  access  and  the  successful  management 
of  data  b crucial  to  the  success  of  any  organization,  regardless  of  whether  that  organization  b a government,  company, 
or  individuaL  Data  management  includes  not  only  the  formatting  of  dau  and  its  manipulation,  bu*  also  the  control 
and  allocation  of  all  file  system  resources:  firmware,  software  and  personnel.  Unfortunately,  data  management  b 
often  restrkted  to  just  the  control  of  data  files  by  a software  system  and  does  not  encompass  the  total  file  system. 

The  marugement  requirements  associated  with  large  daU  files  has  become  so  complex  and  critical  that  a new 
technology  area  b emerging:  the  science  of  Information  Management.  Previously,  procedures  and  techniques  of  thb 
acamcc  have  evolved  in  a haphazard  and  uncoordiruted  'nanner.  Each  data  management  application  within  each 
organization  has  developed  discretely  from  others  and  has  produced  great  waste  and  inefficiency.  Only  recently,  have 
unified  apptxMches  been  documented.  Applicatioiu  techniques  are  also  just  beginning  to  materialize  in  the  fmm  of 
hardware  systems  and  electronic  dau  processing  software. 

S.I  DaU  Base  Management  \ 

The  most  common  approach  to  daU  base  marugement  b to  store  daU  elements  along  ftinctional  lines  of  the 
etgB,'!<zatioiL  Even  within  thb  structure,  dau  b organized  around  special  activities  of  the  functional  organiution.  \ 
As  rww  activiiies  are  identified,  redundant  dau  b created  to  Mibfy  the  ruw  requirements.  Similarly,  if  another  level  ' 
of  organization  needs  access  to  the  data,  the  daU  b copied  and  transferred.  A typical  office  b a perfect  example  of 
the  redundancy  asaociated  erith  human-readable  paper  records;  each  individual  must  or  strongly  feeb  that  he  should 
have  hb  own  personal  copy  of  key  memos,  correspondence  and  reports.  The  fundamental  reason  for  thb  attachment 
to  redundant  files  b accessibility.  Redundancy  can  be  eliminated  by  means  of  a management  system  utilizing 
centralizrd  files:  unfortunately  these  arc  often  difficult  and  tirru  consuming  to  access.  To  be  effective,  therefore, 
any  daU  baae  marugetiwnt  te^nique  must  make  daU  accessible  while  still  accomplishing  a primary  objective  of 
reducir4  redundancy  within  the  file  system. 

Since  data  cxbts  in  either  human-readable  form  or  machbu-icadabk  form,  daU  bases  can  be  atui  are  configured 
around  each  form.  The  major  human-readable  form  of  a data  base  b the  micrographics  files,  the  major  machine- 
readable  form  b the  electronic  daU  processing,  digital  data  base.  Naturally  technology  of  both  areas  can  be  merged 
to  form  a hybrid  daU  base  which  can  provide  unique  advantages  in  special  situations. 

The  mob  common  daU  bate  form  b the  digital  daU  base  which  facilitates  the  procesaing  of  data  elements  into 
Information  luid  the  dbaemination  of  the  information  to  remote  locatioiit.  Tiic  digiul  daU  bate  can, be  configured 
into  a aelf-co stained  facility  which  can  be  accesaed  only  by  programmers  or  it  can  be  oriented  around  a spcctelized 
computer  language  which  aBows  easy  access  to  and  manipulation  of  large  quantities  of  data  by  nonthogrammets. 
lids  tpecialited  software  b called  the  Data  Bate  Managei^t  System  (DBMS). 

Commonly  available  data  bam  management  aaflware  systems  address  rmly  a limited  segment  cjt  data  bam  types 
such  m accounting,  inventory  controL  and  other  commercial  fimetions.  Specialized  systems  are  bejng  developed  to 
addrarn  appUcationt  unique  to  specific  profcstioiial  stem  such  m the  medical  and  legal  profemiont.  The  big  gap  in 
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daU  bate  manatenMal  wftwaic  ii  in  Ike  ■ckMillcallif  «flMM  4ato  bw.  Tkm  ippMcaiioM  MMNMiy  ckmftjfiMd 
by  bfae.  variable  lengih  data  clcuicna  and  by  bmee.  aeicniiricaMy  nrianlid  flir  q«tHM  nawHy  fmetaie 

their  own  manatrmenl  techniques  which  taliiry  individual  and  unique  aoceaa  luquiicinenta.  The  ulliniate  obiective 
of  theae  manafcmenl  techniquea  ate  no  diffetenl  froni  other  applicationa  areas:  Le.,  the  obiective  is  to  eUmiaate  date 
element  stovate  redundancy  and  to  facilitate  data  dement  access. 

S.2  Manatsment  Systems  Software 

A date  baas  mansfcment  system  (DBMS)  software  pbekate  usually  it  coresident  with  a computer’s  operatins 
system  and  serves  to  intercept  and  service  all  input/oulput  requests  for  data  entity  retrieval  and  storapr.  Consequently, 
file  security  and  inteprity  procedures  are  easily  implemented  within  the  DBMS  packape.  AppUcatlons  programs,  which 
arc  a digital  file’s  primary  information  generator,  access  the  DBMS  facilities  by  means  of  maerrs-type  instructions 
which  define  new  files,  establish  new  data  hierarchiet  or  retrieve  old  files  and  hierarchies  for  processing.  The  DBMS 
facilities  may  be  accessed  both  by  normal  host  language  compilers  such  as  COBOL  or  FORTRAN  or  by  special 
DBM.S  applications  programs  which  allow  a nonprogrammer  to  easily  manipulate  the  data  bate. 

A multitude  of  commercially  available  DBMS  roftware  packages  exist  and  a noiKxhauslive  list  is  provided  in 
Table  I to  illustrate  this  fact.  Each  package  provides  distinct  characteristics  which  can  effectively  accommodate 
certain  categories  of  file  systems.  Some  packages  arc  machine  oriented  while  tome  are  machine  independent. 

Naturally,  the  machine  independent  versions  provide  an  additional  measure  of  flexibility  if  hardware  changes  arc 
anticipaled.  Also,  the  machine  independent  packages  are  usually  generated  by  noishatdware  oriented  vendors  and 
must  be  efficient  and  cost-effective  if  the  supplier  is  to  stay  in  business. 

TABLE  I 

Commercially  Available  DBMS  Software 


System 

Acronym 

Vendor  Source 

AOABAS 

Software  AC 

acs 

IBM 

Data  Com/OB 

Computer  Information  Management  Corporation 

DBMS-10 

Digital  Equipment  Coiporation 

DMS-II 

Burroughs  Corporation 

DMS/90 

I Sperry  Univac 

DMS  1100 

Sperry  Univac 

IDMS 

Cullinane  Coiporation 

l-D-S/l 

Honeywell  Informations  Systems 

ISM-VS 

IBM 

IMS-2 

IBM  1 

INQUIRE 

Infodala  ftystems  Incorporated 

Model  204 

CompikerKorporation  of  America 

System  2000 

MRI  Smte^ns 

TOTAL 

CinconlSy^tems 

Wilhout  exception,  generation  of  new  software 
existing  software  it  less  expensive,  less  time  consumii 
commercially  available  DBMS  software  is  recommendi 

5.3  Implemcnlation  Considerations 

A DBMS  system  is  oiily  u good  as  the  facilities. 
Although  applications  software  programs  are  norm:  Ily 
who  initiated  the  retrieval  procedure  is  abo  involved, 
files,  the  human  is  very  actively  involved.  This  individi 
to  him.  From  the  knowledge  he  gains,  he  introduces 
information  generation  activities  within  the  file  system, 
introduced  data  elements  should  (mutt)  be  available  in 


or  any  purpose,  is  an  expensive  undertaking.  Utilization  of 
).  and  less  risky  than  the  do-it-yourself  route.  Consequently, 
when  implementing  or  updating  a file  system. 


ith  software  and  firmware,  which  are  available  to  it. 
ivolved  in  the  process  of  information  generation,  the  human 
many  instances,  and  particularly  in  the  access  of  dynamic 
)l  performs  a real-time  analysis  of  the  information  presented 
data  elements  into  the  file  system  or  initiates  additional 
Ibviously  the  file  must  be  hi^ly  dynamic  as  the  newly 
il  time  for  subsequent  processing.  This  is  not  an  easy 


task  for  most  general  purpose  computen,  especially  largefscale  computer  systems  which  tend  to  be  input/output 
constrained. 
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The  iciricvel  functioii.  Iheicfote,  muM  not  only  icliicvc  and  liamfofm  data  ckmentt  failo  infonnalion  but  aho 
nMHl  picaeni  thh  infonnalion  in  human  detectable  lonn.  nonnally  humanicadable  fonn.  Alphanumeric  piinlm 
fonn  one  dam  of  output  dcvicca  and  tante  from  a limplc  teletype  to  a hiph  ipeed  printer.  Thia  daaa  aho  indudea 
CRT  terminals.  Another  output  dam  it  the  hnape  output  deviw  which  indudea  the  graphica  terminal,  leleviaioa 
diaplay  and  facsimile  reproducer.  Other  less  common  daasea  of  output  devices  stimulate  the  human's  audio,  smell 
or  touch  sensory  systems. 

In  addition,  implementation  considerations  of  the  retrieval  function  include  presentation  of  the  data  at  the 
place  where  needed.  Often  this  implies  remotely  locating  the  output  terminal.  The  impiemenlation  of  even  this 
capability  has  been  facilitated  by  the  rtevelopmenl  of  modems  which  allow  remotely  located  devices  to  interact  over 
standard  communicaiiora  networks. 
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Section  VI  , . 

CONHCURING  THE  RLE  SYSTEM  ' 


6.  OONFIGUIUNG  THE  FILE  SYSTEM  | 

When  an  orcanization  (dcnlirie*  a need  to  etiablish  a file,  it  typically  strives  with  (real  dilifencc  to  implen^t 
the  file  system  using  general  purpose  electronic  data  processing  (EDP)  equipment  which  exists  within  the  organization- 
There  arc  certainly  strong  economic  factors  for  making  such  consideralions.  Unfortunately,  however,  the  common 
approach  taken  is  one  of  tailoring  and  constraining  file  system  requiremenu  around  current  equipment  units  rather 
than  one  of  determining  how  and  what  existing  equipment  can  be  used  to  optimumly  implement  total  file  system 
objectives.  Naturally,  if  the  common  approach  is  taken,  operational  results  do  not  meet  anticipated  performance 
specifications.  It  is,  therefore,  informative  to  review  cursorily  the  development  history  of  general  purpose  EOF 
equipment  and  the  typical  evolution  of  file  systems  which  ate  processed  by  this  equipment. 

I 

Computers  were  first  configured  to  replace  manual  activities  in  accounting  functions.  As  their  versalility  grew 
they  expended  to  other  functional  areas,  but  normally  these  computers  stilt  processed  a single  job  stream  in  serial 
fashion.  The  processing  unit  of  these  ezriy  computers  (and  in  many  of  today's  computers  too)  was  idle  a majority 
of  the  time  waiting  for  the  input/outpiit  (I/O)  facilities  to  supply  data  or  procedures.  This  condition  caused 
improved  I/O  networks  and  peripheral  devices  to  be  developed  with  faster  I/O  transfer  capacities.  Processor  unit 
tanprovemente  continued  in  parallel  with  I/O  improvements  such  that  typical  system  configurations  were  still  I/O 
bound. 

Availabflity  of  sophisticated  software  monitor  systems  which  introduced  memory  partitioning  and  multitask 
processing  significantly  helped  but  still  did  not  fully  alleviate  the  I/O  problem.  In  general,  EDP  systems  continued 
to  grow  bigger  and  to  increase  the  capability  to  process  larger  quantities  of  data,  but  jobs  were  still  serial  or  batch 
processed  and  sophisticated  algorithms  were  used  to  schedule  jobs  such  that  I/O  throughput  was  optimized. 

As  seen  by  this  discussion,  a major  disadvantage  to  the  big  EDP  system  was  accessibility.  Users  submitted  jobs 
and  went  on  to  other  activities  until  results  were  available,  minutes,  hours,  or  days  later.  Typical  changes  were  then 
made  to  the  job  and  it  was  resubmitted.  The  cycle  continued  until  adequate  answers  were  obtained  or  until  frustra- 
tion caused  inadequate,  hence,  incomplete  knowledge,  to  be  tolerated.  The  “time-share"  approach  was  then 
developed  which  allowed  multiple  users  to  effectively  use  the  EDP  system  simultaneously  and  with  “immediate" 
response.  Unfortunately,  when  numerous  users  simultaneously  accessed  the  EDP  system,  I/O  limitations  again  were 
encountered  and  immediate  response  degraded  to  seconds  or  minutes.  Naturally,  even  this  degraded  response  wm 
significantly  better  than  the  previous  batch  processing  approach,  but  human  nature  has  a way  of  quickly  expecting 
the  best  response  at  all  times.  Special  activities  are  even  established  around  best  performance  and  it  then  becomes 
a necessity. 

As  files  evolved  in  an  organization  its  general  purpose  computer  system  was  most  likely  utilized  in  some  aspect 
of  the  file’s  processing  activitv.-s.  The  EDP  equipment  also  continued  to  be  utilized  for  routine  data  processing 
activities  associated  with  the  normal  functions  of  the  organization.  When  new  processing  functions  were  identified 
or  when  new  file  related  functions  were  implemented,  storage  was  expanded,  more  I/O  channels  were  added  (hence 
more  peripherals),  a faster  processing  unit  was  incorporated,  remote  job  entry  capability  was  added  and  even  time 
share  features  to  remote  access  terminals  may  have  been  added.  The  ultimate  feature  to  be  considered  was  the  mass 
store  device.  T)ie  EDP  facility  typically  grew  into  a big  inhomogeneous  monster  full  of  redundancy  and  inefficiency. 

To  help  solve  these  problems  the  user  was  finally  forced  to  implement  some  form  of  a DBMS  to  handle  its  data 
files.  As  noted  above,  DBMS  systems  are  compatible  with  and  operate  in  conjunction  with  computer  operating  system 
software.  The  addition  of  a DBMS  then  becomes  an  expansion  in  software  just  as  the  addition  of  another  peripheral 
is  an  expansion  in  firmware.  In  many  situations  the  DBMS  does  provide  a viable  solution  and  allows  file  manipulation 
tasks  to  be  handled  on  existing  EDP  hardware  in  what  might  be  classified  as  a centralized,  single  processing  unit 
system  configuration,  in  many  other  situations,  the  DBMS  does  not  provide  an  adequate  solution  primarily  because 
generalized  EDP  equipment,  often  configured  from  a single  vendor’s  product  line,  is  applied  to  handle  very  specific 
file  system  problems.  Additionally,  routine  computational  functions  may  be  simultaneously  competing  with  the  file 
system  for  EDP  resources. 

The  physically  Immense,  dynamic  file  system  design  requires  that  special  attention  be  given  to  the  I/O  structure 
of  the  system  implementation.  As  noted  above,  ceninlized  syslems  funnel  all  I/O  requests  Into  a single  procesaor  j 


uatt.  To  handk  the  I/O  throuthpul  pcoMcin  which  b a (ignifkaally  critical  area  of  the  centralbcd  tyatem.  additioiial 
EDf  c«|uipmcnt  can  be  added  to  cimccnlnte  and  ftagr  I/O  Iramaclione  to  the  central  pioccMor. 

Havinc  inuoduced  I/O  cooceniralion  Into  the  EDP  tyttem.  it  now  becomes  highly  economical  to  locate  the 
coocentraior  in  the  peneral  ama  of  the  peripheral  devices  which  feed  it.  In  the  case  of  remote  access  terminals,  the 
cencenlratot  may  he  physically  separated  from  the  central  site  by  a signincant  distance.  What  has  now  luppened, 
b that  the  cenlralired  EDI’  system  has  been  slightly  decentralised. 

Although  economic  factors  srere  probably  the  main  reason  for  initial  decentralization  of  I/O  facilities,  it  was 
quickly  dbeoveted  that  portions  of  all  fib  funclioos  could  be  decentralized  to  remote  locations  such  that  total  O* 
system  response  s*as  drastically  improved.  The  limit  of  deceniralizaiion  b the  dbtnbulcd  file  system  which  applies 
the  **dbidc  to  conquer**  concept  to  overcome  I/O  constraints  of  the  centralized  concept.  In  the  dbiributed  system, 
those  portions  of  the  We  functions  which  are  closely  leUtcd  and  dynamically  interactive  ate  segregated  into  satellite 
subsystems,  li  t satellilc  subsystems  intercommunicale  to  allow  data  element  interchanpe  where  required. 

Satellite  subsystems  ilo  not  necessarily  have  to  be  Urge  scale  EDP  systems.  Fortunately,  as  EOF  systems  grew 
in  size  and  capacity,  many  applicatioiK  were  idenlined  which  could  more  effectively  use  a small  processor  with 
limited  performance.  Thb  was  the  impetus  for  the  minicomputer  technology  development  which  now  yields  mbii- 
computcr  systems  having  data  processing  power  far  superior  to  so  called  large  EDP  equipment  of  a decade  ago. 

The  cvolutiMi  of  rUu  processing  devices  b still  in  process  and  technology  hat  now  yielded  the  microprocessor 
srhick  b typically  used  In  non-dynamic  programming  situations.  File  systems  implementation  alternatives,  therefore, 
todude  b^  scab,  mini,  aitd  micro  computer  conliguralions. 


A trend  in  the  world  today  is  to  continuously  collect  more  and  more  data  in  increasingly  bigger  Hies.  Those 
organizations  having  the  biggest  physical  data  collection,  however,  arc  often  faced  with  the  same  file  manipulation 
problems  as  organizations  having  relatively  small  files.  This  can  be  attributed  to  the  fact  that  proper  file  manipula- 
tion equipment,  both  firmware  and  software,  has  not  been  utilized  in  the  file  system  implementation  configuration.  ^ 

I 

To  assure  that  a file  system  meets  performance  expectations,  an  organization  must  establish  precise  file  system 
objectives  and  procedures.  All  functions  of  the  file  must  be  clearly  defined  and  all  file  states  must  be  identified. 
Implementation  devices  may  then  be  overlayed  on  me  definition  and  performance  parameters  of  alternate  approaches 
may  be  traded  off  within  each  file  function  area.  . 

The  above  statements  are  not  to  imply  that  all  problems  in  the  handling  of  a large  dau  mass  have  been  effectively 
solved.  Significant  problems  still  remain  to  challenge  available  technology  which  often  is  found  lacking  in  required 
capability.  One  especially  critical  area  in  which  technology  must  be  advanced  is  the  area  of  document  conversion 
into  digitally  processable  formats.  Document  conversion  problems  encompass  not  only  lextually  oriented  documents 
but  also  maps,  graphs,  photographs,  etc.  i 

Accuracy  and  conversion  efficiency  arc  still  fundamental  and  overwhelming  problems  even  though  substantial 
technical  activity  has  been  expended  for  many  years  in  both  areas.  For  example,  character  recognition  is  still  limited 
to  basic  fonts  and  requites  operator  intervention  and  monitoring.  Graphic  arts  convenion  is  typically  slow  even  for 
automatic  document  digitizers  which  yield  tremendous  quantities  of  data.  Manual  digitizing  techniques  yield  mote 
efficient  data  formats,  but  are  substantially  slower  and  often  inaccurate.  The  problems  with  map  conversion  arc 
basicaiiy  a composite  collection  of  ail  of  these  problems  compounded  by  the  large  fundamental  data  volume  associated 
with  a map.  Although  map  conversion  and  storage  it  currently  tasking  available  technology  beyond  its  limits,  the 
need  to  digitally  store  and  process  image  data  is  rapidly  reaching  a point  of  urgency.  Unfortunately,  image  data 
volume  is  at  least  an  order  of  magnitude  larger  than  map  data. 

Although  numerous  techniques  and  equipments  have  been  developed,  none  represents  a true  panacea  for  the 
document  cunvenion  problems.  Optical  techniques  are  possibly  offering  the  greatest  potential  and  are  being  actively 
explored.  The  fallout  of  this  activity  can  be  expected  to  continuously  improve  current  techniques. 

Document  conversion  is  not  the  only  area  in  which  technology  can  be  advatKed.  Data  storage  and  data  retrieval 
ate  both  areas  in  which  technology  is  striving  to  provide  improved  methods  and  devices.  Data  storage  has  had  several 
major  breakthroughs  as  noted  in  Section  IV.  Data  retrieval,  however,  is  still  an  area  which  needs  improved  methods 
Most  current  rrtrieval  Khemes  are  eased  on  the  generation  of  cross-reference  indexes.  These  work  efficiently  as  long 
as  the  data  can  be  accessed  by  one  of  the  cross-referenced  parameters  which  was  established  at  file  definition  time. 

Naturally  thii  is  not  the  normal  case:  hence,  some  rapid  mechanism  for  providing  an  efficient  global  search  of  the 
data  base  foi  any  parameter  must  be  developed.  Also,  as  noted  in  the  previous  section,  generalized  file  structures 
which  facilik'.te  data  retrieval  and  processing  are  still  needed  for  many  classes  of  data  storage  systems. 


This  lack  of  readily  available  and  efficient  techniques  and  equipment  further  emphasizes  the  need  to  accurately 
define  a dynamic  file's  purpose  and  function.  Only  then  can  implementation  techniques  be  traded  off  effectively  such 
that  realistic  approaches  are  established. 
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This  report  discusses  what  constihites  a file  system,  how  it  evolves  and  where  special 
emphasis  needs  to  be  placed  in  its  design  and  implementation.  Although  generalized 
file  systems  are  considered  where  possible,  digitally  oriented  files  are  preferentially 
covered.  Emphasis  is  also  placed  on  the  physically  immense  files. 

The  file  functions  of  collection,  conveision,  storage  and  retrieval  ate  presented  and 
the  file  cycle  associated  with  data  element  is  discussed  on  the  basis  of  a generalized 
file  system.  Also  discussed  are  file  system  objectives  and  their  importance  in  the 
definition  of  an  implementation  configuration. 

This  Report  has  been  prepared  at  the  request  of  the  Technical  Information  Pane!  of 
AGARD.  _ 
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